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PIR with Low Storage Overhead: 
Coding instead of Replication 
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Abstract 

Private information retrieval (PIR) protocols allow a user to retrieve a data item from a database without reveal¬ 
ing any information about the identity of the item being retrieved. Specifically, in information-theoretic fc-server PIR, 
the database is replicated among k non-communicating servers, and each server learns nothing about the item re¬ 
trieved by the user. The cost of PIR protocols is usually measured in terms of their communication complexity, which 
is the total number of bits exchanged between the user and the servers. However, another important cost parame¬ 
ter is the storage overhead, which is the ratio between the total number of bits stored on all the servers and the 
number of bits in the database. Since single-server information-theoretic PIR is impossible, the storage overhead of 
all existing PIR protocols is at least 2 (or k, in the case of fc-server PIR). 

In this work, we show that information-theoretic PIR can be achieved with storage overhead arbitrarily close to the 
optimal value of 1, without sacrificing the communication complexity. Specifically, we prove that all known k-server 
PIR protocols can be efficiently emulated, while preserving both privacy and communication complexity but signifi¬ 
cantly reducing the storage overhead. To this end, we distribute the n bits of the database among s + r servers, each 
storing n/s coded bits (rather than replicas). Notably, our coding scheme remains the same, regardless of the spe¬ 
cific k-server PIR protocol being emulated. For every fixed k, the resulting storage overhead (s -|-r)/s approaches 
1 as s grows; explicitly we have r ^ kY^(l -|- o(l)). Moreover, in the special case k = 2, the storage overhead is 
only 1 -|- i. In order to achieve these results, we introduce and study a new kind of binary linear codes, called here 
k-server PIR codes. We then show how such codes can be constructed from Steiner systems, from one-step majority- 
logic decodable codes, from constant-weight codes, and from certain locally recoverable codes. We also establish sev¬ 
eral bounds on the parameters of k-server PIR codes, and tabulate the results for all s ^ 32 and k ^ 16. Finally, 
we briefly discuss extensions of our results to nonbinary alphabets, to robust PIR, and to f-private PIR. 


I. Introduction 


Private information retrieval protocols make it possible to retrieve a data item from a database without disclosing 
any information about the identity of the item being retrieved. The notion of private information retrieval (PIR) was 
first introduced by Chor, Goldreich, Kushilevitz, and Sudan in Q, Q and has attracted considerable attention since 
(see 0-||7|, | [T2| , p5[ , |29|-1311 and references therein). The classic PIR model of Q, which we adopt in this pa¬ 
per, views the database as a binary string x = {xi, ... ,Xn) G {0,1}” and assumes that the user wishes to retrieve 
a single bit without revealing any information about the index i. A naive solution for the user (hereinafter, often 
called Alice) is to download the entire database x. It is shown in Q that in the case of a single database stored on 
a single server, this solution is essentially the best possible: any PIR protocol will require Cl{n) bits of communi¬ 
cation between the user and the server. In order to achieve sublinear communication complexity, Chor, Goldreich, 
Kushilevitz, and Sudan Q proposed replicating the database on several servers that do not communicate with each 
other. They showed that having two replicas makes it possible to reduce the communication cost to while 

k ^ 3 servers can achieve 0((k^ log communication complexity. 

Following the seminal work of |[^, the communication complexity of k-server PIR has been further reduced in 
a series of groundbreaking papers. Ambainis ||T[ generalized the methods of Q to obtain a communication cost of 
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Q^j^l/(2fc 1)^ for all A: ^ 2. This result remained the best known for a while until the O ^)) -complexity har¬ 

rier was finally broken in Q. Five years later, came the remarkable work of Yekhanin | [^ who constructed a 3-server 
PIR scheme with subpolynomial communication cost, assuming the infinitude of Mersenne primes. Shortly there¬ 
after, Efremenko 1131 gave an unconditional k-server PIR scheme with subpolynomial complexity for all k ^ 3. The 
recent paper of Dvir and Gopi 1121 shows how to achieve the same complexity as in | [T3| with only two servers. 

All this work follows the orginal idea, first proposed in Q, of replicating the database in order to reduce the com¬ 
munication cost. However, this approach neglects another cost parameter: the storage overhead, defined as fhe rafio 
between the total number of bits stored on all the servers and the number of bits in the database. Clearly, the storage 
overhead of the PIR protocols discussed above is k ^ 2. If the database is very large, the necessity to store several 
replicas of it could be untenable for some applications. Thus, in this paper, we consider the following question. Can 
one achieve PIR with low communication cost but without doubling (or worse) the number of bits we need to store? 

This question has been settled in the affirmative in 119| for the case where one is willing to replace information- 
theoretic guarantees of privacy by computational guarantees. Such computational PIR is by now well studied — see 
fid) , | |T9l for more information. However, in this paper, we consider only information-theoretic PIR, which provides 
the strongest form of privacy. That is, even computationally unbounded servers should not gain any information from 
the user queries. Somewhat surprisingly, despite the impossibility proof of Q, the answer to our question turns 
out to be affirmative also in the case of information-theoretic PIR. Our results do not contradict Q. To achieve in- 
formation-theoretic privacy, one does need at least two non-communicating servers. However, these servers do not 
have to hold the entire database, they can store only parts of it. We show that if these parts are judiciously encoded, 
rather than replicated, the overall storage overhead can be reduced. 


A. Our Contributions 

We show that all known k-server information-theoretic PIR protocols can be efficiently emulated, while pre¬ 
serving their privacy and communication-complexity guarantees (up to a constant), but significantly reducing the 
storage overhead. In fact, for any fixed k and any e > 0, we can reduce fhe storage overhead to under 1 + e. 

In order to achieve these results, we first partition the database into s parts and distribute these parts among non¬ 
communicating servers, so that every server stores n/s bits. Why do we partition the database in this manner? The 
main reason is that such partition is necessary to reduce the storage overhead. If every server has to store all n bits 
of the database, then the storage overhead cannot be reduced beyond k ^ 2. However, in practice, there may be 
other compelling reasons. For example, the database may be simply too large to fit in a single server, or it may 
need to be stored in a distributed manner for security purposes. We observe that the number of parts s need not be 
very large. With s = 2 parts, we can already achieve significant savings in storage overhead. With s = 16 parts, we 
get a storage overhead of 1.06 (for 2-server PIR protocols). 

Given a partition of the database into s parts, our construction consists of two main ingredients: 1) an existing 
k-server PIR protocol in which the servers’ responses are a linear function of the database bits, and 2) a binary linear 
code, which we call a k-server PIR code, with a special property to be defined shortly. We note that the first re¬ 
quirement is very easy to satisfy: all the existing PIR protocols known (to us) are linear in this fashion. Thus our 
primary focus in this paper is on the construction of k-server PIR codes. 

The defining property of a k-server PIR code is this: for every message bit Uj, there exist k disjoint sets of coded 
bits from which Uj can be uniquely recovered (see Section[n^ for a formal definition). Although this property is 
reminiscent of locally recoverable codes, recently introduced in | |T5| , there are important differences. In locally re¬ 
coverable codes, we wish to guarantee that every message bit U; can be recovered from a small set of coded bits, 
and only one such recovery set is needed. Here, we wish to have many disjoint recovery sets for every message bit, 
and we do not care about their size. To the best of our knowledge, codes with this property have not been previously 
studied, and they may be of independent interest. 

In this paper, we show how k-server PIR codes can be constructed from Steiner systems, from one-step majority- 
logic decodable codes, and from constant-weight codes. We give an optimal construction of such codes for the 
case where the number of parts s is small. We also establish several bounds on the parameters of general k-server 
PIR codes, and tabulate these parameters for all s ^ 32 and k ^ 16. Finally, we briefly discuss extensions of our 
results to nonbinary alphabets, to robust PIR, and to f-private PIR. 
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B. Related work 

There are several previous works which construct coded schemes for the purpose of fast or private retrieval. 


The first work we know of for the purpose of coded private retrieval is the recent work hy Shah et al. |23|. The 
authors showed how to encode files in multiple servers with very low communication complexity. However, their 
constructions require an exponentially large number of servers which may depend on the number of files or their 
size. In another recent work 0 > Chan et al. studied the tradeoff between storage overhead and communication 
complexity, though only for setups in which the size of each file is relatively large. A similar approach to ours 
was studied by Augot et al. 0- where the authors also partitioned the database into several parts in order to avoid 
repetition and thereby reduce the storage overhead. However, their construction works only for the PIR scheme 


using the multiplicity codes by Kopparty et al. 1181 and they didn’t encode the parts of the database as we study 
in this work. 


Batch codes 117| are another method to store coded data in a distributed storage for the purpose of fast retrieval 
of multiple bits. Under this setup, the database is encoded into an m-tuple of strings, called buckets, such that each 
batch of k bits from the database can be recovered by reading at most some predetermined t bits from each bucket. 
They are also useful in trading the storage overhead in exchange for load-balancing or lowering the computational 
complexity in private information retrieval. Another recent work on batch codes was recently studied in |[TT|. 


C. Organization 

The rest of this paper is organized as follows. In Section |T^ we formally define the PIR schemes studied in 
the paper, namely the conventional PIR protocol and coded PIR protocols. In Section III we present our general 
construction of coded PIR protocols and define fhe requirements on a k-server PIR codes that are used in this 
protocol. Section |Iv] studies several constructions of k-server PIR codes. In Section|vj we study the storage overhead 
of k-server PIR codes when the values of s and k are small, and in Section we study the asymptotic behavior 
when either s or k is large. Finally, Section IX concludes the paper. 


H. Definitions and Preliminaries 

In this section we formally define the PIR protocols we study in the paper. A linear code over GF{q) of length 
n and dimension k will be denoted by [n, k]^ or by [n,k,d]q where d specifies fhe minimum distance of the code. 
In case the code is binary we will omit the field notation. For a positive integer n the notation [n] will refer to the 
set {1,... ,n}. We denote by e, the vector with 1 on its z-th position and 0 elsewhere. Let us revisit and rephrase 
the formal definition of a PIR scheme, based upon the definitions taken from Q and |25|. 


Definition 1. A k-server PIR scheme consists of the following: 

1) k servers Si,, 5^, each stores a length-n database x, 

2) A user (Alice) If who wants to retrieve Xj, fori G [n], without revealing i. 

A k-server PIR protocol is a triplet of algorithms V = {Q,A,C) consisting of the following steps: 

1) Alice hips coins and based on the hip coins and i invokes Q{k, n; i) to generate a randomized k-tuple of queries 
{qi,..., qj^), of some predetermined hxed length. For j G [k], the query qj will be also denoted by Qj{k, n;i). 

2) For i G [k], she sends the query qj to the j-th server Sj. 

3) The j-th server, for j G [k], responds with an answer Uj = A{k, j, x, qj) of some hxed length. 

4) Finally, Alice computes its output by applying the reconstruction algorithm C (k, n; i, ai,..., U]^). 

The protocol should satisfy the following requirements: 

• Privacy - Each server learns no information about i. Formally, for any k, n, ii, Z 2 G [n], and a server j G [k], the 
distributions Qj{k, n; ii) and Qj{k, n; ( 2 ) are identical, where the distribution is over the coins hip in StepJ^of 
the PIR protocol. 

• Correctness - For each k, n and x G {0,1}” and i G [n], the user deterministically outputs the correct value of 

Xj, thatisC{k,n-,i,ai ,.. = Xj. 

We follow the common figure of merit to evaluate the system storage efficiency according by its overhead 
| [23) . Namely, the storage overhead of the system is the ratio between the total number of bits stored in the system 
and the number of information bits. For example, the storage overhead of a k-server PIR scheme is k. 
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Another special property of PIR protocols which will he used in our constructions is linearity. This property is 
formally defined as follows. 


Definition 2. A A:-server PIR protocol V = {Q, A,C) is said to be a linear PIR protocol if for every n, j G [k], 
xi,X 2 G {0,1 and a query qj, the following property holds 

A{k,j,xi +X 2 ,qj) = A{k,j,xi,qj) + A{k,j,X 2 ,qj). 


Many, if not all existing PIR protocols, satisfy this linearity property, see for example Q-Q, Q, 112|, |29|-|311. 
Lastly, we assume that the algorithm A is public knowledge in the sense that every server can compute the response 
A{k, j,x,q) for any j G [k], database x, and query q. 

Before formally defining fhe coded version of a PIR scheme, we demonsfrafe fhe main ideas in fhe nexf example. 


Example 1. Consider fhe following 2-server PIR scheme where each server stores an n-bit database x and Alice 
wants to read the z-th bit Xj, for some i G [n], Alice chooses uniformly at random a vector a G {0,1}”. The first 
server receives the query a and responds with an answer of the bit a ■ x. The second server receives the query 
{a + Ci) and responds with an answer of the bit {a + e,) ■ x\ see Fig. Alice receives these two bits and their 
sum gives the z-th bit Xj, since 

a ■ X + {a + ei) ■ X = a ■ X + a ■ X + Ci ■ X = Xi. 

If the servers do not communicate with each other then since the vector a is chosen uniformly at random, the 
value of z remains private. Moreover, the servers’ responses are linear functions of the stored data and thus the 
protocol is a linear PIR protocol. Alice had to transmit 2n bits and 2 bits were received, so a total of 2n -|- 2 bits 
were communicated. The storage overhead of this scheme is 2 and note also that if one of the servers fails then it 
is possible to retrieve the database x from the other surviving server. 

Now, assume that the database x is partitioned into two equal parts of zz/2 bits each, xi and X 2 , where Xi = 
{xi,... ,Xy^j 2 ), and X 2 = (^m/ 2 + 1 / ■ ■ ■ /^«)- The database is stored in three servers. The first server stores xi, the 
second stores X 2 , and the third one is a parity server which stores xi + X 2 . If Alice wants to read the z'-th bit where 
z G [n/2], she first chooses uniformly at random a vector a G {0,1}”/^. The first server receives the query a and 
responds with the bit a ■ xi- The second server receives the query a + ei and responds with the bit {a + e,) ■ X 2 , 
and the third server receives the query a + e, and responds with the bit {a -\- e,) ■ (xi -|- X 2 ). Alice receives those 
three bits and calculates the bit x, according to the sum 

a ■ xi + {a + Ci) ■ X 2 + (a + Ci) ■ (xi -1- X 2 ) = a - xi + {a + Ci) ■ x\ = x,. 

It is clear that both schemes keep the privacy of z. In the first scheme, the number of communicated bits is 
2n -\- 2, while in the coded scheme it is 3zz/2 -|- 3. The storage overhead was improved from 2 to 3/2, and both 
schemes can tolerate a single server failure. However, we note that the coded scheme requires one more server. □ 
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Fig. 1. Alice sends qi = a and q 2 = a + Bi to the servers. The servers respond with a ■ x and (a -|- e,) ■ x and Alice recovers X; as their 
sum. The value of i remains private as the vector a is chosen uniformly at random. 


One may claim that the improvement in the last example is the result of using three instead of two servers. This 
is indeed correct, however, assume that each server can store only n/2 bits. Then, the database x will have to be 
stored over two servers and each of them would have to be replicated, resulting with a total of four servers instead 
of three. Furthermore, the number of communicated bits would still remain the same, 2n -|- 2. Thus, under this 
constraint, we can claim that we improved both the number of servers as well as the number of communicated bits. 

We are now ready to extend the definition of PIR scheme to its coded version. 
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Definition 3. An [m, s) -server coded PIR scheme consists of the following: 

1) A length-n database x which is partitioned into s parts Xi,... ,Xs, each of length n/s. 

2) m servers tSi,..., Sm, where for j G [m] the coded data Cj, stored in the j-th server, is a function of xi,... ,Xs. 

3) A user (Alice) U who wants to retrieve the i-th bit from the database x, without revealing i. 

An (m, s) -server coded PIR protocol is a triplet of algorithms V* = {Q*, A* ,C*) consisting of the following steps: 

1) Alice dips coins such that based on the dip coins and i, she invokes Q*{ni,s,n;i) to generate a randomized 
m-tuple of queries {qi,... ,qrn) of predetermined dxed length. 

2) For j G [m], she sends the query qj to the j-th server Sj. 

3) The j-th server, for j G [m], responds with an answer Uj = A* (m, s, j, Cj, qj). 

4) Finally, Alice computes its output by applying the reconstruction algorithm C* (m, s, n; i, Ui,..., Um)- 
The protocol should satisfy the privacy and correctness properties as stated in Dednition^ 

The next section discusses the construction of coded PIR schemes based upon existing linear PIR protocols. 

III. Coded PIR Schemes 

In this section we will give a general method to construct coded PIR schemes. A key point in the construction of 
coded PIR protocols is to use existing PIR protocols and emulate them in the coded setup. We first give a detailed 
example that demonstrates the main principles of the construction. 

Example 2. Assume there exists a 3-server linear PIR protocol VjQ, A,C) and a length-n database x. Assume 
also that each server can store at most nj^ bits. If one wishes to invoke the PIR protocol V{Q,A,C), then first 
the database x will be partitioned into four parts xi,X 2 ,x^,Xi^. Thus, each of the four parts will be stored in three 
servers so it is possible to invoke the 3-server PIR protocol. This results with 12 servers, each stores n/4 bits, and 
thus the storage overhead is 3. We will show how it is possible to accomplish the same task with storage overhead 
2, that is only 8 instead of 12 servers. Namely, we construct an (8,4)-server coded PIR protocol V*{Q*, A*,C*). 

We use a similar partition of the database x into four parts xi,X 2 , X 3 , X 4 and encode them into 8 servers as 
follows. The j-th server for j G [8] stores the coded data Cj as follows: 

Cl = Xi, C2 = X2, C 3 = X 3 , C 4 = X 4 , 

Cs = XiFX2, C 6 =X 2 + X 3 , C 7 = X 3 + X 4 , C 8 = X4 + ^1- 

In a matrix form notation, these equations can be written in the following way 

/ 1 0 0 0 1 0 0 1 \ 

01001100 
00100110 ■ 

V 0 0 0 1 0 0 1 1 / 

Thus, we encode using an [8,4,3] linear code where the last matrix is its generator matrix in a systematic form. 

Assume Alice seeks to read the f-th bit from the first part of the database x, i.e. the bit Xi i, or x, where i G [n/4]. 
She first invokes algorithm Q to receive the following three queries, 

Q(3,n/4;f) = {qi,q2,q^)- 

Then, she assigns the 8 queries of the algorithm Q* to the 8 servers as follows 

Q*{SA,n-,i) = iqi,q2,q3,q3A2,q2,q3A3)- 

Next, she sends these queries to the servers, which respond with the following answers as listed in Table |T] 

Due to the linearity property of the protocol V, Alice can calculate the following information from the second 
and fifth servers 

«2 = ^2 + «5 = -4(3,2, X2, q2) + A{3, 2, C 5 = xi + X 2 , q2) 

= ^(3,2,xi,^2)/ 


(ci,. . .,C8) = {xi,X2,X3,X4) 




TABLE I 

PIR Protocol for retrieving from the first server 
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Server 

Query 

Response 

I 


fli = A*( 8 , 4 , = A{ 3 ,l,xi,qi) 

2 

^2 

CI2 — .A* (8, 4 ,2, C2/(72) ~ ^ 2 / ^2) 

4 

^3 

fl4 = A*(8, 4 , 4 , C4, (;3) = A( 3 , 3 ,^ 4 , (^3) 

5 

'?2 

CI5 = - 4 *( 8 , 4 , 5 ,05,^2) = A( 3 , 2, C5 = x\ +X2,q2) 

8 

% 

flg = A* (8, 4 ,8, cg, qs) = - 4 ( 3 , 3 , cs = X 4 +xi,q 3 ) 


and from the fourth and eighth servers 

fl 3 = fl 4 + flg ~ ^(3,3, x^,cj'^) + w4.(3, 3 ,cg = T 4 + xi,cj'^) 

= A{3,3,xi,q3). 

She also assigns = Ui. Finally, Alice retrieves the value of ,• hy applying the reconstruction algorithm 

C*(4,4,n;Lfli,.. = C{3,n/A,i,a\,a'2,a'^) 

= C (3, n/4; i, A{3,1, xi, qi), A{3,2, xi, q 2 ), A{3,3, xi, q^)) 

= = Xi. 

Now, assume that Alice wants to retrieve the z-th hit from the second server, or for ^ ^ [rr/2]. As in 

the first case she invokes the algorithm Q to receive 

Q*(4,4,n;z) = {q2,qi,q3A3A2,q3A3A3)' 

where qi,q2,q3 calculated according to Q(3,n/4;z) = {qi,q 2 rq 3 ). However, the queries to the servers will 
he slightly different, as summarized in Table [T^ 

TABLE II 

PIR Protocol for retrieving from the second server 


Server 

Query 

Response 

I 

n 

fli = A* (8,4,1, Cl, (;2) = A{3,2,xi,q2) 

2 


“2 = A* (8,4, 2,C2,qi ) = A(3,1, X 2 , qi) 

3 

‘13 

fl3 = A*(8,4,3, C3,(;3) = A(3,3,;r3, 

5 

‘12 

fls = A* (8,4,5, C5, q2) = .4(3,2,05 = ^1+ X 2 , q2) 

6 

‘13 

fl6 = A* (8, 4, 6, C6,1/3) = A(3,3,C6 = Z2 +^3/'?3) 


Her next step is to calculate the following 

^2 = fli + fl5 = A{3,2, Xi, q2) + A{3,2, C 5 = xi + X 2 , q2) 

A(^3f 2f X2r ^ 2 )/ 

fl 3 = fl 3 + flg = A{3,3, X 3 , q^) + A{3,3, C 5 = T 2 + ^3/ ^ 3 ) 

= ^(3,3,^2/ ^ 3 )/ 

and to assign a'^ = fl2- Finally, Alice retrieves the value of X 2 ^i hy applying the reconstruction algorithm 

C*(4,4,n;Lfli,.. = C{3,n/A,i,a[,a2,a'2,) 

= C (3, n/4; i, A{3,1, X 2 , qi), A{3,2, X 2 , q 2 ), A{3,3, X 2 , qs)) 

= X2,i = X„^2+i- 

□ 


In this example, we have not specified the queries to the other servers simply because they do not matter for the 
reconstruction. Thus we can assign any query to preserve the privacy property. However, we note that the requests 
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to the servers might differ depending on the part of the database from which Alice wants to read the hit, and the 
different requests might reveal the identity of the data (in case the algorithm A does not depend on j then clearly 
this is not a problem). A simple solution for this is to ask every server to return all possible outputs such that it 
simulates each of the possible k requests. For example, if in the first scenario Alice assigns the query qi to the 
first server, then the server’s output consists of three parts: 

A*{8A,^,xi,qi) = {A{3A,xi,qi),A{3,2,xi,qi),A{3,3,xi,qi)). 

That way, Alice can choose the information required in order to compute the bit she wants to retrieve, and the 
server cannot deduce which part of the database the bit is read from. Yet another solution, which improves the 
download complexity, will be given as part of the proof of Theorem below. 

As we saw in Example there are two important ingredients in the construction of coded PIR protocols: 

1) A k-server linear PIR protocol. 

2) An [ni, s] linear code with special properties which are next specified. 


Definition 4. We say that ans x m binary matrix G has property if for every i G [s], there exist k disjoint subsets 
of columns of G that add up to the vector of weight one, with the single 1 in position i. A binary linear [ni, s] code 
C will be called a k-server PIR code if there exists a generator matrix G for C with property A^. Equivalently, let 
c = uG be the encoding of a message u G {0,1}®. Then C is a k-server PIR code if for every i G [s], there exist k 
disjoint sets Ri, ■ ■ ■ ,Rk Q [tn] such that 


Ui = £ Cy = ■ ■ • = £ 

jeRi jeRk 


Ci. 


The construction of k-server PIR codes will be deferred to Section IV In particular, we will be interested in 
finding, for given s and k the optimal m such that an [ni, s] k-server PIR code exists, and the optimal value of m 
will be denoted by A{s,k)- In terms of the minimum distance, let us briefly note that the minimum distance of a 
k-server PIR code is at least k. 

We finish this section with the next theorem which provides the general result for the construction of coded 
PIR protocols. In order to analyze the communication complexity, we denote the number of bits uploaded, down¬ 
loaded of a k-server linear PIR protocol V, by U{V',n,k), D{'P;n,k), respectively. For an (m,s)-server coded 
PIR protocol V*, U*{'P;n,m,s), D*{'P;n,m,s) are defined similarly. 


Theorem 5. If there exists an [m,s] k-server PIR code C and a k-server linear PIR protocol V then there exists an 
{m,s)-server coded PIR protocol V*. Furthermore, 

U*{V*;n,m,s) = m ■ U{V}n/s,k), 

D*{V*-,n,m,s) = m ■ D{V-,n/s,k)- 

Proof: The database x is partitioned into s parts xi, ... ,Xs. Let G be a generator matrix of the code C. Then, 
the data stored in the m servers is encoded according to 

{ci,...,Cm) = {xi,...,Xs) ■ G. 

Let V{Q, A,C) be a k-server linear PIR protocol and we will show how to construct an (m,s)-server coded PIR 
protocol. 

Assume Alice wants to read the f-th bit, i G \n/s\, from the l-th server, x^,-. First she invokes the algorithm Q 
to receive k queries 

Q{k,n/s;i) = {qi,... ,q,,). 

According to the k-server PIR code C, there exist k mutually disjoint sets Rip,.. .,R£,k Q [m] such that for all 
j G [k], xi is a linear function of the data stored in the servers belonging to the set y, that is, 

X£ = Y, ^h- 

heR^j 



Alice assigns the output of the algorithm Q* to he 

Q*{m,s,n-i) = {ql,... 

where for all j G [k] and h G R^^-, qf^ = qj. The other queries q"^ where h ^ can he assigned arbitrarily. 

Then Alice sends the query q"^ to the It-th server, h G [m], and receives the answer 

al = A*{s,r,h,Ch,ql) = {Aik,l,Ch,ql),... ,A{k,k,Ch,ql)). 

From these answers she takes only the parts which are required to invoke the algorithm C, and are determined hy 

ah = 

for h G Rf j. Then, she assigns for j e W, 

aj — ^ afi, 

and finally Alice retrieves the value of ,• hy applying the reconstruction algorithm 

C{k,n/s;i,a[,a2,...,a'^) = xgj. 

The correctness of the last step results from the linearity of the PIR protocol V, since for all j ^[k], 

a'j= £ %= £ A{k,j,Ch,qh) 

heR^j heRi^j 

= A{k,i, Y, (^hAh) = A{k,i,xi,ql) = A{k,i,xi,qj). 

heR^j 

Therefore, 


C{k,n/s;i,a[,a 2 ,...,a[) 

= C{k,n/s;i,A{k,l,X£,qj),...,A{k,k,Xi,qj)) = Xgj. 

Let us add the following modification to this proof, in order to keep the privacy of the part in which Alice reads 
a hit. When Alice invokes the algorithm Q and receives the k queries Q{k,n/s;i) = {qi,... ,qk)’ she also flips a 
coin to choose uniformly at random a permutation a of the elements in [k] and assigns for all j G [k], qj = qo-(jy 
She continues with the algorithm to set for each j G [k] and h G Rfy, qf^ = Then, the fi-th server responds 

with the answer 

al = A*{s,r,h,Ch,qh) = ~^{k,a{j),Ch,ql). 

Next she calculates fly to he 

«;■= Y Y •^ik,a{j),ChAh) 

h G R.£ j // G j 

= A{Ka{j), Y (^hAh) =Ak,a{j),xe,ql) 

heRi^j 

Finally, hy assigning 

«/ = K-Hi) = MK(r(a-^{j)),Xi,q^(^^-^jy ) = A{k, i,xt,qj), 

she completes with the last step of the algorithm. We can see that the privacy of the s parts is kept since each server 
is required to invoke the algorithm A with the parameter o'(y). Since the permutation a was chosen uniformly at 
random, the distribution ^(y) is identical for any choice of I, one of the s parts of the database where Alice wants 
to retrieve a bit. Lastly, the privacy of the bit that Alice reads is guaranteed from the privacy of the PIR protocol 
V. 

The coded PIR protocol V* uploads D{'P;n/s,k) bits to each server and downloads D{V;n/s,k) from each 
server. Thus, we get that U*{V;n,m,s) = m ■ U{V;n/s,k) and D*{V',n,m,s) = m ■ D{V',n/s,k)- ■ 
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IV. Constructions of Coded PIR Schemes 


In this section we give several methods to construct /c-server PIR codes with the properties specified in Section [ni| 
As we shall see the properties of A:-server PIR codes are similar to some of the existing codes in the literature, 
such as one-step majority logic codes |10|, codes with locality and availability |15|, | [2T| , p2| , |24|, and 

comhinatorial objects such as Steiner systems. We point out that the simple-parity code is the optimal 2-server PIR 
code and thus A[s,2) = s 1. Therefore, we focus in this section only on the case where k > 2. 


A. The Cubic Construction 

Our first construction is based on the geometry of multidimensional cubes. Let us assume that s = for 

some positive integer cr. We will give a construction of an [m,s] /c-server systematic PIR code, where m = + 

{k — l)cr^~^ = s + {k — This code will be denoted by Cj{{o',k). 

The information bits in Ca will be denoted by where 1 ^ ij ^ a for j E [k — 1]. The {k — 

redundancy bits, which are partitioned into k — 1 groups, are denoted and defined as follows: 

Jk-l ^ A-l' 

h=l 

for £, E [k — 1]. 

In the next example we demonstrate the construction of the code Ca{o',5). 


Examples. Assume that k = 3 and s = for some positive integer a. The code Ca{o',3) has in this case 
2a redundancy bits. The codewords are represented in a square array of size ( 0 " + 1) x (cr + 1), without the bit 
in the bottom right corner. The information bits are stored in a o' x c subsquare, and the remaining bits are the 
redundancy bits such that all rows and all columns are of even weight; see Fig. So, for i E [o']. 






Xl,l 

Xl,2 


Xl,j 


Xl^a 

p^i^ 

X2,l 

X2,2 


X2,j 


X2,a 

P2^ 

* 

* 


] 


* 

• 

Xi,l 

Xi,2 


Xij 


Xi,a 


* 

• 





• 

Xa,l 

X(j,2 


Xa,j 


X(j,(j 

Pa ^ 

(2) 

p\ 

(2) 

P2 


(2) 

P) 


„(2) 

Per 



Fig. 2. The cubic construction for 3-server PIR code. The bit x,- y can be recovered by itself, the bits in the i-th row besides x,- y, and the 
bits in the j-th column besides x,- y. 

One can verify that every information bit, for LA2 £ [o'], has three mutually disjoint sets such that y^ 
is a linear function of the bits in each set. These sets are {zy^ y^}, • • •, Xy^ y^-i, Xy^ y^+i,... ,x:yj q-, and 

..., Xyj_|_i y^,..., P/j /• Note that the cell in the bottom right corner was removed since it is 

not used in any of the recovering sets. Finally, we conclude from this example that 

A{a^,3) ^ 0 -^- 1 - 2a, 
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and the storage overhead is 1 + which approaches 1 when a hecomes large enough. 


□ 


Next, we explicitly prove that the code Cj{{o',k) is a /c-server PIR code. 

Theorem 6. For two positive integers cr and k, the code (o", /c) is a k-server PIR code. In particular, we get that for 
any positive s 

A(s,k) ^ s + (k — 1) 

Proof: For any information hit ,ik-\ code Cyi(cr, A:), the following k sets 


R 


( 0 ) 

h. 


R'' ’ - Ix- ■ I 


= Uh,-, 


ti,--,tk-i k-h,--,ik-l ■ ^ ^ ,k-vk+i,--,ik-i^' 

for 4 = 1,2, • • • ,k — 1, 


are all disjoint, and for all 1 ^ 4 ^ ^ — 1 we have 


'■hAir--Ak-l 


X. 


x^R 


( 4 ) 

'!-■■■ 4-1 


Therefore, Cj[{o',k) is a A-server PIR code. 

For the general construction of arbitrary values of s, let a he such that (cr — 1 ^ < s ^ o-^ Using the code 

0 ^( 0 ',k), we add A — 1 sets, each of redundancy hits, to the information hits to form a A-server PIR code. 
In case that s < we simply treat the missing hits in the square as zeros. Hence, 

A(s,A) ^ s + (A — l)cr*^“^ = s + (A — 1) 


For a fixed A, the asymptotic behavior of the storage overhead in the cubic construction is given by 1 + (!l(s 
which already proves that the asymptotic storage overhead approaches 1, that is, 

ita ^ = 1. 

S^CXD S 

However, as we shall see in the sequel, it is still possible to improve the value of A{s,k) for specific values of s 
and A, and fo find construcfions which fheir sforage overhead approaches 1 fasfer than the decay exponent given 
by the cubic construction. Lastly, we note that a recursive form of this construction appears in fTT] for the purpose 
of constructing batch codes. 


B. PIR Codes Based on Steiner Systems 

The idea behind a construction of any A-server PIR code C is to form, for every information bit, A mutually 
disjoint subsets of [m], such that the information bit can be recovered by a linear combination of the bits in each 
set. Assume that C is a systematic [m, m — r = s] A-server PIR code. Then, we can partition its bits into two parts; 
the first one consists of the s information bits, denoted by Xi,... ,Xs and the second one is the r redundancy bits 
pi,... ,pr, where every redundancy bit p, is characterized by a subset S, C [s] such that p, = Lyes, 

According to this representation of systematic codes, every collection S = (Si,..., Sy) of subsets of [s] defines 
a sysfemafic [s -|-r, s] linear code Cb{S). In the next lemma, we give sufficient (but not necessary) conditions such 
that the code Cb{S) is a A-server PIR code. 

Lemma 7. Let S = (Si,..., Sr) be a collection of subsets of [s], such that 

1) For all i G [s], i appears in at least k — 1 subsets, 

2) For all G [r], | Sy D S^ | ^ 1. 

Then, Cb {S) is a k-server PIR code. 
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Proof: For any information bit Xi, i G [s], according to the first condition there exist some k — \ subsets 
Sjj,..., such that i G S^. for j G [k — 1], For each j E [/c — 1], let Rj be the set Rj = {x£ : i G 7^ 
i} U {pij}, and finally let Rj^ = {xi}. According to the second condition all these k sets are mutually disjoint. 
Finally, it is straightforward to verify that Xi is the sum of the bits in every set, and thus Cg (5) is a A:-server PIR 
code. ■ 

After determining the conditions in which the code Cb{S) is a A:-server PIR code, we are left with the problem of 
finding such collections of subsets. Our approach to fulfill the conditions stated in Lemma |7] is to search for existing 
combinatorial objects in the literature. One such an object is a Steiner system. A Steiner system with parameters t, 
i, n, denoted by S{t,£,n), is an n-element set S together with a set of ^-element subsets of S (called blocks) with 
the property that each f-element subset of S is contained in exactly one block. It is also commonly known that the 
number of subsets in a Steiner system S{t,£,n) is (”)/(f) and every element is contained in exactly (”ri)/(fZi) 
subsets. 

In order to satisfy the conditions in Lemma |7j we chose Steiner systems with f = 2 so the intersection of every 
two subsets contains at most one element. Furthermore, in a Steiner system S{2,£,n), the number of subsets is 
( 2 )/( 2 ) ~ ~ ~ 1) every element is contained in (n — !)/(£ — 1) subsets. Thus, we conclude with 

the following theorem. 


Theorem 8. If a Steiner system S (2, + 1, s) exists, then there exists an [m = s -f r, s] k-server PIR code where 

s(k—lR 

^ = s+k-2 ' assumption we have 

s(k — 1)^ 


A(s,k) ^ s + 


s + k — 2' 


( 1 ) 


Moreover, if a Steiner system S(2,k — l,r) exists, then we have a k-server PIR code with parameters [m, s] = [r 


r(r-l) r(r-l) 

{k-l){k-2)' (k-l){k-2) 


1. Thus, 


A 


r{r - 1) 


{k-l){k-2) 

5 — 1 

k-1 


,k] ^ r + 


r{r - 1) 


Proof: Let 5 be a Steiner system S(2, + l,s), so the number of subsets in S is 


r = 


(k-l)(k-2)' 
o1 

s(s — 1) _ s{k — 1)^ 


( 2 ) 




and every element is contained in 

-p 1 

(s-l)/(k-l) 

subsets. We also have that the intersection of every two subsets contains at most one element, so the conditions 
in Lemma 7 hold and Cb{S) is a k-server PIR code. To prove the bound given in oL let t = { 2 )/ be the 
number of ^ — l)-element subsets of S{2,k — l,r), and denote them by 81,82, ■ ■ ■ C [r]. Let us construct the 
dual Steiner system S'{2, t) which consists of r (^ 55 )-elenient subsets of [t] denoted by 8 '^, 82 , ■ ■ ■, 8 ^., and 
has the property that iS- = G [t],z G 5^}. We now use the first statement in (j^ to construct the code Cb(S'). 
It is clear that the redundancy of Cb(S') is given by r, and the code length is given by r + t = r + ■ ■ 

Example 4. A finite projective plane of order q, with the lines as blocks, is an S(2,q -f l,q^ -f q -f 1) Steiner 
system. Since q -f 1 = ^ “b conclude that there exists an [s + r, s] {q + 2)-server PIR code, with 

s = q^ -f q 1 information bits and 


+ q + l){q + 1 )^ _ {q^ -\-q-\- l){q + 1 )^ 

q^-\-q~\~I~\~q~\~2 — 2 (^“b 1)^ 

redundancy bits. Note that the storage overhead of this code is 2. 


q^ -f q 1 


□ 
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In order to evaluate the boundin Theorem]^ one is required to figure out the existence of S{t,i,n) in gen¬ 
eral. Indeed, Wilson Theorem claims that for a fixed I, and sufficienfly large n, a Steiner system S{2,i,n) exisfs 
given dial Ihe following Iwo condilions (also known as divisibilily condilions) are satisfied (see |[26|-|[28| for more 
details): 

1) and 

2 ) 

Wilson Theorem guarantees the existence of S{2,k — l,r) for infinitely many values of r. Hence, for a fixed k, 
Ihere are arbifrary large values for r such fhal Ihe bound in Q holds. Hence, fhe redundancy behaves asymplolically 
according to A{s,k) — s = 0(s^/^), which improves upon the cubic construction. 


C. One-step Majority Logic Codes 

One-step majority logic decoding is a method to perform fast decoding by looking at disjoint parity check con¬ 
straints that only intersect on a single bit (see |10| - Chapter 8.) These parity check constraints correspond to the 
codewords in the dual code, and hence, for a linear code [n,k,d], the goal is to find, for each i G [n], a sel of 
codewords in Ihe dual code that intersect only on the z-th bit. These codewords are said to be orthogonal on the 
f-th bit. The maximum number of such orthogonal vectors in the dual code (for every bit) is denoted by /, and if 
/ = d — 1, then the code is called completely orthogonalizable. 

In other words, if an [n,k] code has / orthogonal vectors on the z-th coordinate for some z G [n], then its dual 
code has k — 1 = ] codewords that are orthogonal on coordinate z. Assume that these codewords are given by 




= Xi 




02 


Ipj 


VI ^ / G /, 


(3) 


where the sets {z}, and {/i,//p,} for j G [/] are mutually disjoint. Such [n,k,d\ code with / orthogonal 
vectors for each z G [n] is called a one-step majority logic code with } orthogonal vectors. Note that the definition 
of one-step majority logic codes is almost identical to the one of PIR codes given in Definition While one-step 
majority logic codes guarantee that orthogonal vectors (or mutually disjoint sets) exist for all the bits in the code, 
in PIR codes we require this property only for the s information bits. While it is not always straightforward to 
construct an appropriate generator matrix from a given code such that the k-server PIR property holds, for the case 
of one-step majority logic codes, we can always pick a systematic generator matrix and hence the PIR property 
follows. Lastly we note that the idea of using one-step majority logic codes was motivated by the recent work on 
codes for locality and availability in |16|. 

We demonstrate the construction of such codes in the following example. 


Examples. Consider a (15,7) cyclic code generated by the polynomial 

g[x) = 1 + + z®. 


The parity-check matrix of this code in the systematic form is given by 
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0 

0 

0 

0 

0 

0 

0 

1 

1 

0 

1 

0 

0 

0 


0 

1 

0 

0 

0 

0 

0 

0 

0 

1 

1 

0 

1 

0 

0 


0 

0 

1 

0 

0 

0 

0 

0 

0 

0 

1 

1 

0 

1 

0 
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0 

0 

0 

1 

0 

0 

0 

0 

0 

0 

0 

1 

1 

0 

1 

0 

0 

0 

0 

1 

0 

0 
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0 

0 
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1 
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1 

1 

1 

0 

0 

1 

1 


0 

0 

0 

0 

0 

0 

0 

1 

1 

0 

1 

0 

0 

0 

1 

We observe that the following codewords 

in 













h3 

= 

(0 

0 

0 1 

0 

0 

0 

0 

0 0 

0 

1 

1 

0 

1). 

hi+5 

= 

(0 

1 

0 0 

0 

1 

0 

0 

0 0 

0 

0 

0 

1 

1). 

ho+2+6 

= 

(1 

0 

1 0 

0 

0 

1 

0 

0 0 

0 

0 

0 

0 

1). 


hj 

= 

(0 

0 

0 0 

0 

0 

0 

1 

1 0 

1 

0 

0 

0 

1). 
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are orthgonal on coordinate 14. That gives us five mutually disjoint sets {3,11,12}, {1,5,13}, {0,2,6}, {7,8,10}, 
and {14} that are required in Definition to make five different queries on server 14. The same statement is cor¬ 
rect for all other coordinates due to the cyclicity of the code. So, C is a 5-server PIR code. The storage overhead 
of the coded PIR scheme based on C is given hy t ’ which is significantly better than the uncoded PIR. □ 


There are several algebraic constructions for one-step majority logic codes. However, the explicit relation between 
the code length and redundancy is only known for a few of them. Type-1 Doubly Transitive Invariant (DTI) Codes 
(see m - p. 289) are cyclic codes with almost completely orthogonalizable property. An explicit relation between 
the code length m = (2^ — 1), code dimension (s), and the number of orthogonal codewords in the dual code 
(/), is known for specific choices of these parameters: 

• Case I. Let 6,1 be two positive integers. For M = 261 and / = 2^ + 1, the redundancy of the type-1 DTI 
code of length m is given by 

r = (2^+1 - 1)^ - 1. (4) 

We refer to these codes by Cc^{6,C). 

• Case II. Let be two positive integers. For M 
code of length m is given by 

r = 2 ^ 

We refer to these codes by 

We refer the reader to m for the algebraic construction and the calculation method used in deriving these 
parameters. 


= A£ and / = 2^ — 1, the redundancy of the type-1 DTI 
-(2^-1)^-1. (5) 


Theorem 9. For any positive integers 6,1, and A, the Type-1 DTI codes Cq {6, i), (3, 1') are (2^ + 2)-server, and 

2^-server PIR codes, respectively. In particular, we get that 

^(220£_ (20-hi _i)f^2^-h2) ^2^®^-!, 

A((2^-l)^-l,2^) ^2^^-l, 


and hence for any fixed k, there exists a family of k-server PIR codes with asymptotic storage overhead of 1 + 0 (s 2 ). 

Proof: We have already shown that a one-step majority logic code with / orthogonal vectors, is also a (/ + !)- 

server PIR code. So we are left with only calculating the code dimensions according to the redundancies in Q 

and Q. 

• For Cq {6,i), the code dimension is given by s = m — r = 2^®^ — (2^^^ — 1)^. 

• For Cc 2 {^, the code dimension is given hy s = m — r = [2^ — lY — 1- 

So the upper bounds are validated. For the asymptotic analysis, we point out that for a given fixed /, as fhe number 
of servers grows, the rates of the codes in both cases I, and II become arbitrary close to 1. In particular, when k 
is fixed and s becomes large, fhe storage overhead in Cq {A, £) is 


2A£ _ I 
(2A - i)f - 1 


2 

2^ - 


1 + 


e 

2^-1 


l + 0(s-?). 


which is an improvement compared to Theorem in the asymptotic regime. An even better storage overhead is 
achieved by Cc^{6,£) codes in the asymptotic regime: 


220f_i _i 

220^ _ (29+1 -ly “ ^ ^)- 


( 6 ) 


Note that this construction not only outperforms the former ones with respect to the upper bound on the asymp¬ 
totic storage overhead, but also gives a bound on A{s,k) that does not depend on k is in the asymptotic regime. 
Considering that the construction based on Steiner systems also result in a similar bound, we ask the following 
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two questions regarding the asymptotic storage overhead behavior of A:-server PIR codes. 

Question 1. Is ([^ the optimal asymptotical behavior for A{s,k)l 

A more challenging question would be to show the same statement for finite numbers. In particular, 

Question 2. Are there any values of s and 3 such that A{s,k) < s + ^/s3 

D. Constant-Weight Codes 

Assume that G is a generator matrix of a systematic A-server PIR code C of length m and dimension s. We 
rewrite G as 


G — [fslAdsxr]/ (7) 

where Ig is the s x s identity matrix and Msxr corresponds to the r parities in C. Let us look at the systematic PIR 
codes from a graph theory point of view by interpreting M as the incidence matrix of a bipartite graph Q with 
partite sets X = {xi, X 2 , ■ ■ ■, Xg} and V = {pirfi, ■ ■ ■ ,Pr}, and edges £ = {{xi,pj}\Mij = 1}. We call C by 
the Systematic PIR code based on Q. The following lemma is an equivalent statement to Lemma 

Lemma 10. Let Q be a bipartite graph with partite sets X = {xi, X 2 ,... ,Xg}, V = {pi, p 2 , ■ ■ ■, pr}, ^nd the in¬ 
cidence matrix M, where A — 1 = min;^^;^ deg(x). Further, assume that Q has no cycles of length 4. If C is the 
systematic code based on Q with generator matrix defined in in Q, then C is a k-server PIR code of length m = s -\-r 
and dimension s. 

Proof: Consider Xi and A — 1 of its parity neighbors {pq, ..., C V. Let R- C A” denote the 

neighbor set of pi-. Since Q is 4-cycle free, the sets R- \ {x,} (for a fixed i and / G [A — 1]) are mufually disjoinf. 

If is also easy fo see fhaf {x;}, and {pi-} U R- \ {xi} (for j = 1,2,... ,A — 1) form A disjoin! recovery sefs for 
tq. In other words, 

Pij = L Xi = Pi. + Y. for / = 1,2, • • ■ A - 1. 

■ 

Now we are ready to proceed to the final construction of A-server PIR codes, which will be first demonstrated 
by an example. 


Example 6. Consider the 3-server PIR code C given by the systematic generator matrix 
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The corresponding bipartite graph is also shown 

in 

Fig 

0 












We observe that deg(x,) = 2 for all i as expected since we only need A — 1 = 2 elements in J\f{xi) to recover 
Xi, where J\f{(x) is the neighborhood set of a. Moreover, 

\J\f{xi)nJ\f{xj)\ ^ 1 for i 7 ^ ;, (8) 

which guarantees that the recovering sets for x, are mutually disjoint. □ 
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Fig. 3. The bipartite graph Q associated with the matrix MioxS- 


The requirement that deg(X;) ^ k — 1 can he replaced with deg(X;) = A: — 1 in this construction. This motivates 
us to look at constant-weight codes where the codewords are all rows in the matrix M, such that the condition 
in ([^ still holds. For instance, we look at a constant weight code with weight k — 1 and minimum distance 2k — 4, 
which guarantees the condition in Let M{k,r) he the list of the largest code of length r whose codewords 
have weight k — 1 and their minimum distance is 2k — 4. Co {k, r) is a k-server PIR code defined hy its systematic 
generator matrix = [I\M{k,r)]. 

We use the notation B{n,w,d) to denote the maximum number of codewords of length n and weight w with 
minimum distance d. There are numerous works and studies aiming to determine the precise values of B{n,w,d) 
in general, hut the explicit formula is only found for the trivial cases. A complete collection of the known precise 
values and both upper bounds and lower bounds on B{n,w,d) is given in ||^. 


Theorem 11. For any k the code Cd {k, r) is a k-server PIR code. In particular, we get that for any positive integer k 

A{B{r,k-l,2k-4),k) ^ B{r,k - 1,2k - 4) + r. 

Proof: Let Q be the bipartite graph whose incidence matrix is M{k, r). It is clear that deg(x) = k — 1 for all 
X ^ X. Also, 


|A/'(x)| = |A/'(i/)| = k-l, 

|{Ak(x) UA/'(y)} \ {A/'(x) nA/'(i/)}| ^ 2k-4 
=> {A/'(x) nA/'(i/)}| ^ 1. 


Hence, all of the conditions in Lemma 10 are satisfied and Cjo{k,r) is a k-server PIR code. To validafe the param¬ 
eters in the theorem it suffices to note that |M(k,r)| = B{r,k — 1,2k —4), so we have B{r,k— 1,2k —4) rows in 
G, and r, the length of the codewords in M{k,r), determines the redundancy. ■ 


Example?. The only known explicit formula for B{n,w,d) is when w = 2 and d = 2. It is easy to see that any 
two different codewords of weight 2 have distance at least 2 as well. Hence B{n,2,2) = ( 2 ). So, 


A 




(9) 

□ 


According to inequality we observe again that the asymptotic behavior of A(s, 3) — s is 0(s^/^). The con¬ 
struction based on Steiner systems and the last construction based upon constant-weight codes are both equivalent 
to the problem of finding bipartite graphs with s vertices on the left and r vertices on the right, where all the left 
vertices have degree k, the graph has girth at least 6, while minimizing the value of r. Clearly, if a Steiner system 
with the desired parameter exists, then it is an optimal solution. However, constant-weight codes provide a solu¬ 
tion in the general case particularly when the desired Steirner system does not exist. The following theorem on 
bipartite graphs is both well-known and trivial (see for example Proposition 7 in |^), and shows that by using 
this method of construction for k-server PIR codes, we can not achieve a better asymptotic storage overhead than 
the one achieved by Steiner systems. We include here the proof for the sake of completeness of the results in the 
paper. 
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Theorem 12. Let Q be a bipartite graph with partite sets X = {xi, X 2 , ..., Xg} and V = {pi, • • • / Vr}, where 
deg(x) = k> 2 for all X G X. If the graph has girth at least 6, thenr{r — 1) ^ sk{k — 1). Hence, for a fixed k, we 
have r = 0(s^/^). 

Proof: Let J\f{x) denote the neighbor set of the vertex x. Since the graph has no 4-cycles, then there are no 
i, j, G [s] and a, h G [r], such that 


Therefore, 


{pa,Ph} C ff{xi), and 
{pa,Pb} C AfiXj). 



V. Optimal Storage Overhead for fixed s and k 

In this section, we study the value of A{s,k) for small s and k, in particular we give an upper bound on A{s,k) 
for all values of s ^ 32 and k ^ 16. In order to give the best upper bounds, we benefit from a few supplementary 
lemmas that together with the constructions introduced in section IV form a recursive method in deriving the upper 
bounds on A{s,k). 

Note that the constructions introduced in Section IV do not cover all values of s and k. The following lemmas 
give simple tools to derive upper bounds for all values of s and k. 


Lemma 13. We have the following inequalities for all non-negative integer values of s, k, s^ and k': 

(a) A{s,k + k') ^ A{s,k) + A{s,k'), 

(b) A{s-\-s\k) ^ A{s,k) +A(s',/c), 

(c) Als,k) ^ A{s,k-\- 1) - 1, 

(d) A{s,k) ^ a{s + \,k) - 1. 


Proof: To prove the inequality in (a), assume that C and C are /c-server and k'-server PIR codes with param¬ 
eters [m,s] and [m',s], and their generator matrices are given by G and G^ respectively. It is easy to see that the 
concatenation of C and C is a {k + A:')-server PIR code with parameters \m + m',s] and its generator matrix is 
given by Gconc = [G | G^]. To prove the inequality in (b), assume again that C and C are A-server PIR codes with 
parameters [m,s\ and [ni',s'], and their generator matrices are given by G and G^ respectively. The direct sum 
code (also known as the product code) of C and C is a A-server PIR code with parameters [m -|- m', s + s'] whose 
generator matrix is given by 


G* 


-’sxm' 

G' 


To prove (c), let us assume that C is a (A + 1)-server PIR code with parameters [m, s] and a generator ma¬ 
trix G. According to Definition for every information bit Ui,i G [s], there exist A + 1 mutually disjoint sets 
Rip, ■ ■ ■ ,R; jt+i C [m] such that for all j G [A], m,- is a linear function the bits in Rj j. It is now clear that delet¬ 
ing one of the coordinates from G or equivalently puncturing the code C in one of its coordinates can truncate at 
most one of these disjoint recovery sets. Hence the punctured code Cpunc whose parameters are given by [m — I,s] 
is a A-server PIR code. We postpone the proof of part (d) to the end of this section, where we discuss whether 
Definition is a property of the generator matrix or it can be interpreted as a property of the code itself. ■ 


Lemma 14. Ifk is odd, then A(s, A + 1) = A(s, A) + 1. 

Proof: Utilizing part (c) in Lemma [T^ it only suffices fo show that if A is odd, then A(s, A + 1) ^ A(s, A) + 1. 
To do so, assume that C is a A-server PIR code with parameters [ni,s\ and generator matrix G. For any i G [s] 
we should be able to find A disjoint subsets of columns where the columns in each subset sum up to the vector e,. 
If the sum of all colunms in G is 0 ,then clearly the sum of the remaining columns (the ones that are left out of 
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TABLE III 

Upper bound for A{s,k) for small values of s and k. For each k, the value on the left represents the size of the best 

PIR CODE CONSTRUCTIONS I.E. A(s,k), AND THE RIGHT COLUMN REPRESENTS THE STORAGE OVERHEAD ASSOCIATED WITH 

THAT CONSTRUCTION. BY LEMMa[^THE VALUE OF A{s,k) FOR ODD k IS GIVEN BY A{s,k+ 1) — 1. STARRED VALUES ARE PROVED 

TO BE OPTIMAL. 


s\k 

2 

3 

4 

6 

8 

10 

12 

14 

16 

1 

2=*= 

2.00 

3* 

3.00 

4* 

4.00 

6* 

6.00 

8* 

8.00 

10* 

10.0 

12* 

12.0 

14* 

14.0 

16* 

16.0 

2 

3* 

1.50 

5* 

2.50 

6* 

3.00 

9* 

4.50 

12* 

6.00 

15* 

7.50 

18* 

9.00 

21* 

10.5 

24* 

12.0 

3 

4* 

1.33 

6* 

2.00 


2.33 

11* 

3.67 

14* 

4.67 

18* 

6.00 

21* 

7.00 

25* 

8.33 

28* 

9.33 

4 

5* 

1.25 

8 

2.00 

9 

2.25 

12* 

3.00 

15* 

3.75 

20 

5.00 

24 

6.00 

27* 

6.75 

30* 

7.50 

5 

6 * 

1.20 

10 

2.00 

11 

2.20 

13 

2.60 

19 

3.80 

24 

4.80 

26 

5.20 

29 

5.80 

31* 

6.20 

6 

Tjif. 

1.17 

11 

1.83 

12 

2.00 

14 

2.33 

21 

3.50 

26 

4.33 

28 

4.67 

35 

5.83 

40 

6.67 

7 

8* 

1.14 

12 

1.71 

13 

1.86 

15 

2.14 

23 

3.29 

28 

4.00 

30 

4.29 

38 

5.43 

43 

6.14 

8 

9* 

1.13 

13 

1.63 

14 

1.75 

20 

2.50 

28 

3.50 

34 

4.25 

40 

5.00 

48 

6.00 

54 

6.75 

9 

10* 

1.11 

14 

1.56 

15 

1.67 

23 

2.56 

30 

3.33 

38 

4.22 

45 

5.00 

53 

5.89 

60 

6.67 

10 

11* 

1.10 

17 

1.70 

18 

1.80 

24 

2.40 

35 

3.50 

41 

4.10 

48 

4.80 

57 

5.70 

61 

6.10 

11 

12* 

1.09 

19 

1.73 

20 

1.82 

25 

2.27 

37 

3.36 

42 

3.82 

50 

4.55 

62 

5.64 

67 

6.09 

12 

13* 

1.08 

20 

1.67 

21 

1.75 

26 

2.17 

39 

3.25 

43 

3.58 

52 

4.33 

64 

5.33 

69 

5.75 

13 

14* 

1.08 

21 

1.62 

22 

1.69 

27 

2.08 

41 

3.15 

44 

3.38 

54 

4.15 

66 

5.08 

71 

5.46 

14 

15* 

1.07 

22 

1.57 

23 

1.64 

29 

2.07 

43 

3.07 

45 

3.21 

58 

4.14 

68 

4.86 

74 

5.29 

15 

16* 

1.07 

23 

1.53 

24 

1.60 

34 

2.27 

44 

2.93 

46 

3.07 

62 

4.13 

70 

4.67 

80 

5.33 

16 

17* 

1.06 

24 

1.50 

25 

1.56 

37 

2.31 

45 

2.81 

47 

2.94 

64 

4.00 

72 

4.50 

84 

5.25 

17 

18* 

1.06 

27 

1.59 

28 

1.65 

38 

2.24 

46 

2.71 

48 

2.82 

66 

3.88 

76 

4.47 

86 

5.06 

18 

19* 

1.06 

28 

1.56 

29 

1.61 

39 

2.17 

47 

2.61 

49 

2.72 

68 

3.78 

78 

4.33 

88 

4.89 

19 

20* 

1.05 

29 

1.53 

30 

1.58 

40 

2.11 

48 

2.53 

50 

2.63 

70 

3.68 

80 

4.21 

90 

4.74 

20 

21* 

1.05 

30 

1.50 

31 

1.55 

41 

2.05 

49 

2.45 

51 

2.55 

72 

3.60 

82 

4.10 

92 

4.60 

21 

22* 

1.05 

31 

1.48 

32 

1.52 

42 

2.00 

50 

2.38 

52 

2.48 

74 

3.52 

84 

4.00 

94 

4.48 

22 

23* 

1.05 

32 

1.45 

33 

1.50 

47 

2.14 

51 

2.32 

53 

2.41 

76 

3.45 

86 

3.91 

100 

4.55 

23 

24* 

1.04 

33 

1.43 

34 

1.48 

50 

2.17 

52 

2.26 

54 

2.35 

78 

3.39 

88 

3.83 

104 

4.52 

24 

25* 

1.04 

34 

1.42 

35 

1.46 

51 

2.13 

53 

2.21 

55 

2.29 

80 

3.33 

90 

3.75 

106 

4.42 

25 

26* 

1.04 

35 

1.40 

36 

1.44 

52 

2.08 

54 

2.16 

56 

2.24 

82 

3.28 

92 

3.68 

108 

4.32 

26 

27* 

1.04 

38 

1.46 

39 

1.50 

53 

2.04 

55 

2.12 

57 

2.19 

84 

3.23 

96 

3.69 

110 

4.23 

27 

28* 

1.04 

39 

1.44 

40 

1.48 

54 

2.00 

56 

2.07 

58 

2.15 

86 

3.19 

98 

3.63 

112 

4.15 

28 

29* 

1.04 

40 

1.43 

41 

1.46 

55 

1.96 

57 

2.04 

59 

2.11 

88 

3.14 

100 

3.57 

114 

4.07 

29 

30* 

1.03 

41 

1.41 

42 

1.45 

56 

1.93 

58 

2.00 

60 

2.07 

90 

3.10 

102 

3.52 

116 

4.00 

30 

31* 

1.03 

42 

1.40 

43 

1.43 

57 

1.90 

59 

1.97 

61 

2.03 

92 

3.07 

104 

3.47 

118 

3.93 

31 

32* 

1.03 

43 

1.39 

44 

1.42 

58 

1.87 

60 

1.94 

62 

2.00 

94 

3.03 

106 

3.42 

120 

3.87 

32 

33* 

1.03 

44 

1.38 

45 

1.41 

59 

1.84 

61 

1.91 

63 

1.97 

96 

3.00 

108 

3.38 

122 

3.81 


the k subsets) is also the vector e,-. Hence the code is actually a. {k + l)-server PIR code and we are done. If not, 
append one more column to G so that the sum of all the columns is 0. Then the resulting matrix is a generator 
matrix for a (A: + 1)-server PIR code. 

By selecting the best constructions for A{s,k) from section 


IV| for each individual s and k, and then updating 
the table with respect to lemmas 13 and[^ we are finally able to give an upper bound on A{s,k) for all values 
of s, and k. Table III contains the upper bound obtained on A{s,k) for all A: ^ 16 and s ^ 32. We observe that 


the storage overhead is significantly improved compared to the traditional uncoded PIR scheme. Moreover, the 
inequality A{s,k) ^ s + always hold. The asymptotic behavior of A{s,k) is discussed in next section. 

In the remainder of this section, we seek to address the following key question: Is it the generator matrix that 
has the A-server PIR property, or it can be interpreted as a property of the code? Let us begin the discussion with 
the following definition. 


Definition 15. We say that an [m, m — s] binary linear code C has property if there exist s cosets of C such that: 

a) Every coset contains k disjoint vectors, and 

b) The linear span of these cosets is the entire space F“. 

Theorem 16. If the code C has the property then its dual code is a A-server PIR code. 
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Proof: We show that the definition 15 and are equivalent. Clearly, Definition 15 above is a property of a 
code, not a matrix. Now, given a generator matrix G with for the PIR code C^ we get a code C with property By. 
hy simply taking C to he the code defined hy G as its parity-check matrix. Now lets proceed to the other direction. 

Assume that a code G with property By is given. Let Gi, C 2 , ■ ■ ■ , Gj he the s linearly independent cosets of G, 
each containing k disjoint vectors. Start with an arbitrary parity-check matrix H for G. Let cr^, 0 ^ 2 , ■ ■ ■ , 0's denote the 
syndromes of Ci, C 2 , ■ ■ ■, Q with respect to H. Let S be the s x s matrix having these syndromes as its columns. 
Note that condition b) of Definition 15 guarantees that S is full-rank. Now form the s x (m + s) matrix [H|S], 
and perform elementary row operations on this matrix to get [H'|S'] where S' is the s x s identity matrix. Then 
the matrix H' is a generator matrix for the k-server PIR code G^ which is clearly the dual code of G. ■ 

The following lemma from the theory of the linear codes is essential for the proof of part (d) in Lemma 13 We 
leave the proof to the reader. 


Lemma 17. Let G be an [ni, k) binary linear code. Given a positive t ^ m — k, let Ci, C 2 ,... ,Ct be cosets of G, and 
let Si, S 2 , ■ ■ ■ ,St be their syndromes. Then 

dim (span(Gi, G 2 , • • •, Gf)) = k + f 

if and only if the syndromes Si, S 2 , ■ ■ ■ ,St are linearly independent. 


We are now able to prove part (d) of Lemma [TS] as promised earlier. 

Proof of Lemma \3 (d): Suppose A{s,k) = m. Then there exists an [m,m — s] code G with property By. 
Moreover, no column in a parity check matrix for G is entirely zero, otherwise A{s,k) ^ m — 1. Puncture the 
code G in any position. Upon puncturing, a) above remains true trivially. It remains to show that we can find some 
s — 1 cosets of the punctured code that generate which is a direct result of Lemma [lt] Hence the resulting 

[m — l,m — s] code has property By, and it is a k-server PIR code. ■ 


VI. Asymptotic Behavior of Coded PIR 

While deriving the precise values of A{s,k) was our initial interest, studying the asymptotic behavior of A{s,k) is 
no less interesting. In particular, lower bounds will help us to find constructions with the optimal storage overhead. 
Asymptotically, we will analyze the value of A{s,k) when s is fixed and k is large, and vice versa. We briefly 
mention that we solved the first case while the lower bounds for the latter are yet to be found. 


A. Storage overhead for fixed s 

Let us first focus on the case where s, the ratio between the length of the whole data and the storage size of 
each server, is a fixed integer number, but k, the PIR protocol parameter, is large. 


Theorem 18. For any pair of integer numbers s, and k, we have 

Ms^k) ^ ^^s_i k, ( 10 ) 

with equality if and only if k is divisble by 2®“^. 

Let us use the following example to illustrate the proof. 


Examples. Assume s = 3, and G is an (m,3) PIR code with k-server PIR property. The generator matrix of G 
contains m columns, each of length 3. The list of all possible options is shown in table IV Let us assume that the 
column multiplicities are given by pa, Pb/ Pc, Px, Py, Pz, and pm- 

Since the code has the k-server PIR property, there should be k disjoint sets of columns each with ha as their 
sum. ha, hy + hz, he + hy, and hx + hw are all such possibilities. It is easy to notice that there is no combination 
of the columns of type hy, he, and hx that would give ha. So, each of the k sets should include at least one of the 
other columns. Therefore, 


Pa -f Py + Pz-f Pw ^ k. 
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TABLE IV 

List of all 7 different type of columns used in constructing an (m,3)-PIR code. 



Similar to {hi,,hc,hx}, we have three other sets {hx,hy,hz}, {hc,hz,hit}}, and {h},,hy,hiu} that are incapable of 
recovering the first data chunk hy their own. So we have three more constraints 

Mfl + Mb + Me + Miy ^ k, 
l^a + l^b l^x + ^ k, 

l^a + l^c + l^x + l^z ^ k. 

Redoing the above argument for the second and the third information chunk, we get the following three new 
constraints 


Mb + Me + My + Mz ^ k, 
Mb + Mx + Mz + Mze ^ kf 
Me + Mx + My + Mie ^ k; 


And, by adding all the above constraints we have 


A (3, k) — m — + P-c + Mx + My + Mz + Mw ^ 

It is trivial that when k is divisible by 4, setting jUg = jUy = jUc = lix = My = Mz = Mre = | gives the equality. We 


can indeed use the results from Lemmas 


13 


and 


14 


to prove that A{3,k) = 


□ 



Proof of Theorem 18 • For the general s, the generator matrix contains at most 2® — 1 different non-zero 
columns. Assume C is a k-server PIR code with length m and dimension s. Therefore, for each 1 ^ z ^ s, one 
can find k disjoint subsets of the columns with their equal to 

■■ , 0 )'. 

Similar to the example, we look at all the (s — 1)-dimensional subspaces V in F|, such that e, ^ V. It is clear that 
no combination of the columns in V can retrieve e/. So, each of the k subsets should include at least one vector 
from V‘^, where V‘^ denotes the complement of V in F^. Now let 1/ be a subspace of F^ that does not contain the 
unit vector e,. Then jj.j, ^ k is a constraint involving lig.. 

There are 

(2®-2)(2®-4)---(2®-2®-^) ^ 1 

(2S-1 - l)(2s-l - 2) ■ ■ ■ (2^-1 - 2S-2) 

such subspaces for each i, which gives us 2®“^ constraints for each e;. It suffices to show that there are exactly 
2® — 1 unique constraints after merging all these sets. Now we recall that the non-zero codewords of the simplex 
code of length 2® — 1 are precisely the supports of all sets of the form V‘^, where V is an (s — 1)-dimensional 
subspace of F^ (see |10| - page 380 for proof.) It is now clear that we have 2® — 1 unique constraints since there 
are exactly 2® — 1 codewords of weight 2®“^ in the simplex code of length 2® — 1. Moreover, exactly 2®“^ of these 






























20 


codewords have value 1 in a fixed coordinate. In other words, we get the vector 2® ^l 2 s-i as the sum of all these 
codewords. So, 

2^-1 ^ ^ ( 2 ^ - 1 )^ « 
ueF| 

A{s,k) = m= ^ Mv ^ 

i;eF| ^ 

■ 

Let us fix s. The introduced lower hound in ( [T^ along with its equality condition shows that A{s,k) = 0{k), 
when k becomes large. In other words A{s,k) ~ 2k for small s andk large. 

B. Storage overhead for fixed k 

It was already shown that for fixed k, there are elementary constructions to achieve storage overhead 
arbitrary close to 1. We are yet to determine how fast it decreases. Table [V| summarizes the constructions introduced 
in the previous section with their asymptotic behavior. Note that, the explicit formula for the constructions based 
on constant-weight codes is only known for k = 3. 


TABLE V 

Comparison of the constructions for A { s , k ) with respect to the asymptotic code redundancy (A(s,fc) — s). 


Code construction 

Upper bound on A{s,k) 

Asymptotic redundancy 

Cubic construction 

A{s,k) ^ s + {k — 1) 

O(s^-Ai) 

Steiner System 

A ( n(n-l) A ^ «(«-!) 

^ ^ u -1- 

0(sz) 

Type-1 DTI codes (1) 

A ( 220 ^ - (20+1 - 1)^2^ -h 2 ) sS 2201^ - 1 

0(s?) 

Type-1 DTI codes (2) 

A ((22 - 1)^ - 1,2^) ^2i'l’-l 

O(si-i) 

Constant weight codes 

A(a3)^rfi+n 

0(sz) 


We observe that non of the introduced constructions achieves storage overhead less than 1 + 0(s 2 ). However, 
it is not clear if that is the optimum value one can get. So far, the best (and trivial) lower bound is given by 

A{s,k) ^ s + O(logs). 

VII. PIR Array Codes 

In all the constructions we presented so far, we assumed that the database was partitioned into s parts, where 
every server stores n/s bits that were considered to be a single symbol. In this section we seek to extend this idea 
and let every server store more than a single symbol. For example, we can partition the database into 2s parts of 
nj (2s) bits each such that every server stores two symbols. This can be generalized such that every server stores a 
fixed number of symbols. One of the benefits of this method to construct PIR codes is that we can support setups 
in which the number of bits stored in a server is n/s where s is not necessarily an integer. Furthermore, we will 
show that it is also possible to improve, for some instances of s and k, the value of A{s,k) and hence the storage 
overhead as well. Since every server stores more than a single symbol we treat the code construction as an array 
code and thus we call these codes PIR array codes. When a server receives a query q then it resposes with multiple 
answers corresponding to the number of symbols stored in the server. We illustrate the idea of PIR array codes in 
the next example. 

Example 9. Assume that the database x is partitioned into 12 parts xi, xi,..., xyi which are stored in four servers 
as follows. 
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Server 1 

Server 2 

Server 3 

Server 4 

Xl 

X 2 

^3 

Xi + X2 -h X3 

X 2 

^3 

Xi 

X6 

X4 

X5 

X 4 + X5 -h X(, 

X4 

X5 

X6 

^8 

Xg 

Xy 

X 7 -\-Xs+ X 9 

X 9 

x? 

X 8 

Xll 

Xn 

X12 

^10 + ^11 + ^12 

xn 

X12 

XlO 


Thus, every server stores 7 parts, each of n/12 hits, so hits are stored in each server and the storage overhead 
is 7/3. Using this code, it is possible to invoke a 3-server linear protocol V{Q, A,C). Assume Alice seeks to read 
the hit Xij for i G [n/12], she invokes the algorithm Q to receive three queries Q{3,n/12;i) = (iji,i/2/<?3)- The 
first sever is assigned with the query qi, the second and fourth servers with the query q 2 and the third server with 
the query q^,. Each server responds with 7 answers corresponding to the 7 parts it stores. Alice receives all 28 
answers hut only needs 5 answers to retrieve the value of Xij. From the first server she receives the answer = 
A{3,l,x:i,qi), from the second server she receives two answers = A{3,2,X2,q2) and = A{3,2, x^, q 2 ), 
from the third server a^, = A{3,3, q^), and lastly from the fourth server she receives = ^(3, 2 , xi + X 2 + 

Xzrqi)- Note that from the linearity of the protocol V, we have 

^2 3“ ^2 ^4 ~ -^(3/ 2 , X2, qi) A{3,2, x^,, q2) ~\~ A(3,2, xi X2 x^,, q2) — A(3,2, xi, q2)/ 

and thus Xij is retrieved hy applying the algorithm C 

xij = C(3, n/12;z,fli,fl2 + ‘^2 + 


□ 


In the last example, we see that we repeated the same code four times. That was done in order to guarantee 
that the number of symbols stored in each server is the same. We could instead show only the first two rows of 
the first code and then claim that by interleaving of the column which stores only one symbol it is possible to 
guarantee that each server stores the same number of symbols. While we saw that in this example it is possible 
to construct PIR codes with more flexible parameters, the download communication was increased and we needed 
only 5 out of the 28 received answers. However, since the number of symbols in each server is fixed (and will be 
in the constructions in this section) the communication complexity order is not changed. 

In general, we refer to an mi x m 2 array code as a scheme to encode s information bits Xi,...,Xs into an 
array of size mi x m 2 . An (mi x m 2 ,s)-server coded PIR protocol is defined in a similar way to Definition]^ We 
formally define PIR array codes. 

Definition 19. A binary [mi x m 2 , s] linear code will be called a k-server PIR array code if for every information 
bit Xi, i G [s], there exist k mutually disjoint sets Rj i,..., Riy- C [m 2 ] such that for all j G [k], z,- is a linear function 
of the bits stored in the columns of the set R^ j. 

Very similarly to Theorem we conclude that if there exists an [mi x m 2 , s] k-server PIR array code and a 
k-server linear PIR protocol V then there exists an (mi x m 2 ,s)-server coded PIR protocol that can emulate the 
protocol V. Next, we give another example of PIR array code which explicitly improves the storage overhead. 

Example 10. We give here a construction of [2 x 25,6] 15-server PIR array code. The 6 information bits are 
denoted by Xi, X 2 , X 3 , X 4 , x^, x^ and are stored in a 2 x 25 array as follows: 


1 

2 

3 

4 

5 

6 

7 

8 

9 

10 

11 

12 

13 

14 

15 

Xi 

Xi 

Xl 

Xl 

Xl 

X2 

Xl 

Xl 

Xl 

^3 

^3 

^3 

X4 

X4 

X5 

^2 

^3 

X4 

^5 

X 6 

^3 

X4 

X 5 

X 6 

X4 

X5 

^6 

X5 

X 6 

X6 


16 

17 

18 

19 

20 

21 

22 

23 

24 

25 

X1-I-X2-I-X3 

X1-I-X2-I-X4 

X1-I-X2-I-X5 

X1-I-X2-I-X6 

X1-I-X3-I-X4 

X1-I-X3-I-X5 

X1-I-X3-I-X6 

X1-I-X4-I-X5 

X1-I-X4-I-X6 

X1-I-X5-I-X6 

X 3 -hX 4 -HX 5 

X3-hX5-hX6 

X3-I-X4-I-X6 

X3-I-X4-I-X5 

X2-I-X5-I-X6 

X2-I-X4-I-X6 

X2-I-X4-I-X5 

X2-l-X3-hX6 

X2-hX3-HX5 

X2-I-X3-I-X4 
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The first row specifies the server numher. The other two rows indicate the hits which are stored in each column. 
It is possible to verify that this construction provides a 15-server PIR array code. For example for the first hit we 
get the following 15 sets: 

{!}, {2}, {3}, {4}, {5}, {6,16}, {7,17}, {8,18}, {9,19}, {10,20}, {11,21}, {12,22}, {13,23}, {14,24}, {15,25} 

where it is possible to retrieve the value of Xi by the bits stored in the columns of each group. 

The number of bits stored in each server of this example is n/3 and thus s = 3. If we had to use the best con¬ 
struction of an [m,3] 15-server PIR code, then A(3,15) = 26 servers are required, while here we used only 25 
servers. Hence, we managed to improve the storage overhead for s = 3 and k = 15. □ 


We extend Example [T0| to a general construction of PIR array code. Let f be a fixed integer f ^ 2. The number 
of information bits is s = f(f + 1), the number of rows is = f and the number of columns is m 2 = 
where m^ = and m^^ = In the first m^ columns we simply store all tuples of t bits out of 

the f(f -|- 1) information bits. In the last m'^ columns we store all possible summations of f + 1 bits. There are 
such summations and since there are t rows, t summations are stored in each column, so the number of 
columns for this part is We also require that in the last columns every bit appears in exactly 

one summation. Note that Example [T^ is a special case of this construction for f = 2. A code generated by this 
construction will be denoted by CA-pjji(t). 


Theorem 20. For any integer f ^ 2, the code Cj{-pip{t) is an [nii x m 2 , f (f -|- 1 )] A:-server PIR array code where 

K‘ + n\ , ("m’) 


s = f + 1, mi = f, m 2 = 
and its storage overhead is 


t J t 


-,k = 


f(f + l) 
t 


f + 1 


Table VI compares the improvement in the number of servers, and thus storage overhead, when using the PIR 
array code CA-PiR{t)- Eor f = 2 and s = 3, A = 15 we know the exact value of A[s,k) according to Table 


and for all other values of t we get a lower bound on the value of A{s,k) according to Theorem 10 


III 


TABLE VI 

Comparison between the code Ca-pirH) and the corresponding best values of A{s,k)- 


t 

s 

k 

m 2 

A{s, k) 

2 

3 

15 

25 

26 

3 

4 

220 

385 

^ 413 

4 

5 

4845 

8721 

> 9387 

5 

6 

142506 

261261 

^ 280559 


The constructions presented in this section are examples for improvements either in the storage overhead or the 
existence of codes with other parameters which cannot be achieved by the non-array PIR codes. We hope that more 
constructions will appear to further improve these parameters. 

VIII. Alternative Contructions 

In this section we discuss several more constructions of coded PIR schemes with special properties. Eirst we 
start with the extension of binary coded PIR schemes to non-binary codes. Then, we show how other extensions 
of PIR schemes, namely robust PIR and coalitions PIR, can be adjusted for the coded PIR setup. 


















23 


A. Non-binary Coded PIR Schemes 


In this section we extend the results from Section IV to the non-hinary setup. Since the construction from The¬ 


orem consists of a A:-server linear PIR protocol and a /c-server PIR code we require the protocol and code to he 
over the same field GF{q), where q will he a power of a prime number. 

In the extension of Definitions [T] and we require that the database and the responses of algorithm A in the 
protocol are over the same field GF{q). Therefore, in Definition we also require the linearity of A to be over 
GF{q). The Definition of /c-server PIR codes remains the same while the linearity of the sets is over GF{q). For 
any s and k we denote by A{s,k)q to be the smallest m such that an \m,s\ /c-server PIR code exists over the field 
GF{q). The construction of /c-server PIR protocol in Theorem remains almost identical. 

We summarize here the required modifications in this proof, under the assumptions mentioned above. 

1) The database x is partitioned into s parts xi,... ,Xs which are encoded using a generator matrix G over GF{q) 
as before to receive the coded data which is stored in the m servers {ci,..., Cm)- 

2) Alice wants to read the symbol She invokes the algorithm Q and receives the k queries as (qi,... ,qk)- 

3) We assume that there exist k mutually disjoint sets Rg^i, ■ ■ ■ ,R£^k ^ [™]> such that for j G [k], we can write 

X£ = ^ 

where the coefficients are over the field GF{q). 

4) The output of the algorithm Q*{m,s,n;i) is assigned as before with the queries and the received answers 
and 

5) For i G [k] the value of a' is calculated according to 


«/= Y.^hah= Y.‘^hM^,i,Ch,qh 


heRi^j heRe^j 

= A{k,i, Y, (XhCh,ql)=A{k,i,X£,ql)=A{k,i,X£,qj). 

heR^j 

6 ) Alice calculates the symbol ,• as before according to 
C{k,n/s;i,a[,a2,... 

= C {k, n/s) i, A{k,l,Xi,qj),..., A{k, k, X£, q,)) = 

We can always use the binary constructions as k-server PIR codes (assuming for example that the code is given 
by a parity check matrix), so we can conclude that A{s,k)q ^ A{s,k)- However, we note that the definition of 
k-server PIR codes is very much related to the recently well-studied locally recoverable (LRC) codes |15|, | |2^ , 
I p^ . A code C over GF{q) is said to have locality r if every symbol in each codeword from C can be recovered by 
a subset R of at most r other symbols from the codeword. The set R is called a recovering set of the symbol. A code 
C is called an LRC code with locality r and availability t if every symbol has t pairwise disjoint recovering sets, 
each of size at most r. In case the code is systematic while the locality and availability requirements are enforced 


only on the information symbols then it is called an LRC code with information locality r and availability t |22|. 
Our definition of k-server PIR code is closer to LRC codes with information locality, however we don’t require 
the code to be systematic. Furthermore, the major difference is that we don’t restrict the size of the recovering 
sets. The connection between k-server PIR codes and LRC codes with availability is stated as follows. The proof 
is omitted since it is straightforward. 


Theorem 21. If a code C is an LRC code with information locality r (or locality r) and availability t = k — 1 then it 
is a k-server PIR code. 

For the non-binary setup, there are several constructions of LRC codes with availability, see for example m, 
(H), | [22| , | [24| . While it is not necessarily immediate to find examples where we gef better results, in terms of the 
value of A(j(s, k), than the binary case, it is still possible to improve the minimum distance of the code. 

The following example will example demonstrates this idea. 
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Example 11. Assume that s = 2 and A: = 3, we already sas that A(2,3) = 5, however the minimum distance of 
such a code is 3, which is optimal. Let us consider the case q = 4, then the two information symbols x-[,X 2 are 
encoded to the following five symbols: 

{xi, X2, Xi + X2, Xi + aX2, Xi + CX^X2), 

where ct is a primitive element in GF(4). It is possible to verify that this is a 3-server PIR code, and its minimum 
distance is 4, where in the binary case we could only have minimum distance 3. □ 


B. Robust PIR and t-private PIR 

Lastly, we briefly note here that our constructions of A-server PIR codes can be used also to construct coded PIR 


protocols for robust PIR and t-private PIR \29\. 

A k-out-of—l PIR protocol is a PIR protocol with the additional property that Alice can compute the value of 
Xi even though she received only k out of the £ answers. In order to emulate such a protocol V we simply use an 
[m, s] A-server PIR code and repeat the same steps as in Theorem Then, we can emulate the protocol V and if 
at most £ — k answers were not received, then Alice will still be able to privately recover the value of the bit Xj. 

A t-private PIR protocol is a PIR protocol where every collusion of up to t servers learns no information on 
the bit Alice seeks to read from the database. Given a f-private PIR protocol V, we follow again the same steps 
of Theorem]^ to construct an (m,s)-server coded PIR protocol V*. Since the protocol V is f-private, we get also 
that every collusion of t servers learns no information on i, the bit that Alice attempts to read. This property results 
from observing that every t servers have together at most t out of the queries that Alice sends to the servers, and 
according to the f-privacy property of the protocol V, the same privacy is preserved for the protocol V* as well. 


IX. Conclusions and Open Problems 

A new framework to utilize private information retrieval in distributed storage systems is introduced in this 
paper. The new scheme is based on the idea of using coding instead of the replications in the traditional PIR 
protocols, when the storage size of each server is much less than the size of the database. We have shown that 
among the three main parameters in measuring the quality of A-server PIR protocols i.e. communication complexity, 
computation complexity, and storage overhead, the first two remain the same and the latter improves significantly in 
the asymptotic regime. In particular, for a fixed A and a limited server size, the storage overhead becomes 1 + o(l) 
as the number of servers becomes large. 

The optimal storage overhead with the coded PIR is also studied and the explicit value is derived for many cases. 
The presented constructions lead to coded PIR schemes with storage overhead 1 + 0(s“^/^) for any fixed A, where 
s is the ratio between the size of the database and the storage size of each server. Hence, it will be interesting 
to determine whether this asymptotic behavior can be improved. Another research direction is the construction of 


other coded schemes which are compatible with existing PIR protocols, such as the ones given in Sections VII 
and ivml 
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